Retrieving Domain-Specific Collocations by Co-occurrences and Word Order Constraints
نویسندگان
چکیده
In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method comprises the following stages: (1) extracting strings of characters as units of collocations, and (2) extracting recurrent combinations of strings as collocations. Through this method, various types of domain-specific collocations can be retrieved simultaneously. This method is practical because it uses plain text with no specific-languagedependent information, such as lexical knowledge and parts of speech. Experimental results using English and Japanese text corpora show that the method is equally applicable to both languages.
منابع مشابه
Retrieving Collocations by Co-occurrences and Word Order Constraints
In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method retrieve collocations in the following stages: 1) extracting strings of characters as units of collocations 2) extracting recurrent combinations of strings in accordance with their word order in a corpus as collocations. Through the method, various range of collocations, especially...
متن کاملRetrieving Collocations from Text: Xtract
Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of writing, including both technical and nontechnical genres. Several approaches have been proposed to ...
متن کاملA Three-Layered Collocation Extraction Tool and Its Application in China English Studies
We design a three-layered collocation extraction tool by integrating syntactic and semantic knowledge and apply it in China English studies. The tool first extracts peripheral collocations in the frequency layer from dependency triples, then extracts semi-peripheral collocations in the syntactic layer by association measures, and last extracts core collocations in the semantic layer with a simi...
متن کاملRetrieving Collocations From Korean Text
This paper describes a statistical methodology ibr automatically retrieving collocations from POS tagged Korean text using interrupted bigrams. The free order of Korean makes it hard to identify collocations. We devised four statistics, 'frequency', 'randomness', 'condensation', and 'correlation' .to account for the more flexible word order properties of Korean collocations. We extracted meanin...
متن کاملA Corpus-Based Tool for Exploring Domain-Specific Collocations in English
Coxhead’s (2000) Academic Word List (AWL) has been frequently used in EAP classrooms and re-examined in light of various domain-specific corpora. Although well-received, the AWL has been criticized for ignoring the fact that words tend to show irregular distributions and be used in different ways across disciplines (Hyland and Tse, 2007). One such difference concerns collocations. Academic word...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Intelligence
دوره 15 شماره
صفحات -
تاریخ انتشار 1999